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INTRODUCTION 

Wiener  introduced  a  fresh  approach  to  the  study  of  infor- 
mation transmission  in  the  presence  of  perturbing  noise  in  19L(.2. 
His  pioneering  work  showed  that  the  two  problems,  prediction  of 
random  signals,  and  separation  of  random  signals  from  random 
noise,  lead  to  the  Wiener-Hopf  integral  equation.   He  also  pre- 
sented the  solution  for  the  special  case  of  stationary  statis- 
tics and  rational  spectra.   Many  extensions  and  generalizations 
of  Wiener's  work  followed.   Most  of  the  work  has  been  done  in 
the  frequency  domain  using  transform  methods  to  obtain  the  spec- 
ifications of  a  linear  dynamical  system  which  accomplishes  the 
prediction  and  filtering  of  the  random  signals.   These  ideas 
form  the  so-called  "Classical  Filtering  Theory".   During  the 
past  seven  years,  many  new  techniques  and  concepts  have  been 
introduced  in  this  area  of  study  because  of  the  advances  in 
digital  computer  technology  and  the  challenge  of  aerospace  tech- 
nology.  Most  of  this  work  has  been  done  in  the  time  domain  using 
the  concepts  of  state  and  matrix  theory.   These  ideas  form  what 
is  called  "Modern  Filtering  Theory".   There  is  no  doubt  that 
Kalman's  work  is  the  most  notable.   In  i960,  Kalman  presented  a 
new  approach  to  the  standard  filtering  and  prediction  problems. 
His  work  combined  two  well  known  ideas.   One  is  that  a  dynamical 
system  is  described  by  the  "state-transition"  method.   The  other 
is  that  linear  filtering  is  regarded  as  an  orthogonal  projection 
in  Hilbert  space.   (Hilbert  space  is  a  Banach  space  whose  norm 
has  the  parallelogram  property,  Ref.  18.) 


He  also  assumed  that  the  required  statistical  data  is  given  in 
such  a  form  that  determination  of  the  optimal  filter  is  highly 
simplified,  with  a  single  equation  covering  all  cases.   This 
single  equation  is  called  the  variance  equation;  it  is  a  non- 
linear differential  (difference)  equation  of  the  Riccati  type. 
Hence  the  variance  equation  is  closely  related  to  the  Hamiltonian 
differential  equations  of  the  calculus  of  variations.   An  exact 
formula  for  the  solution  of  the  variance  equation  is  available. 
The  actual  solution  consists  of  the  specification  of  the  differ- 
ential equation  governing  the  optimal  filter. 

Wiener  described  the  random  process  by  its  power  spectral 
density  or  correlation  function.   Kalman  assumes  the  random  pro- 
cess to  be  Markovian.   In  other  words,  Kalman  describes  the 
linear  dynamic  system  by  a  set  of  first-order  differential  (or 
difference)  equations  and  the  Wiener  problem  is  approached  from 
the  point  of  view  of  conditional  distributions.   All  statistical 
calculations  and  results  are  based  on  means  and  covariances.   No 
other  statistical  data  are  needed. 

FUNDAMENTALS  OP  KALMAN  FILTERING 

Notation 

Vectors  will  be  denoted  by  small  underlined  letters: 
u,  v,  .  .  . ,  x,    y_,  z   with  coordinates  u^,  v-j_,  .  .  .,  x-j_,  y^,  z^. 
Matrices  will  be  denoted  by  underlined  Roman  and  Greek  capitals: 
F,  G,  .  .  . ,  (5,  P  with  elements  fji,  gji*  •  •  •>    Pij-   Tiie  unit 


matrix  is  I;  the  prime  denotes  the  transposed  matrix.   Constants 
will  be  denoted  by  small  Greek  letters.   Time  will  be  denoted  by 
t,  tg,  t-p  or  i.      These  may  be  arbitrary  real  numbers  (contin- 
uous time  case)  or  arbitrary  integers  (discrete  time  case).   The 

inner  product  of  x  and  y_  is  denoted  by  x'y.   The  norm  ||x||  i-s 

l/2  l/2 

(x'x)     or  (x,  x)    •   The  quadratic  form  x'Ax  ^s  denoted  by 

11  n2 

|x  J  A,  if  A  is  a  symmetric  nonnegative  definite  matrix.   Numer- 
ical quantities  will  always  be  real  numbers.   The  symbol  e(1 
denotes  the  expectation  operator.   Covariance  matrices  are  de- 
noted by  cov  |x,  x  and  covfx,  y_~|,  where 

cov  [x,  xj  =  Ef(.x  -  E  fx))(x  -  E  (x})  '! 

=  Efx,  x'}     if  e[x|  =  0. 
cov  [*>  z]  =  e{(x  -  Ejxj)(y_  -  E[ZJ)') 

=  E[xv_'|       if  Efxl  =  0   and  eM  =  0. 

Preliminaries 

A  linear  dynamical  system  governed  by  a  difference  equation 
(discrete  time  case)  can  always  be  described  in  the  standard 
form. 

x(t  +  1)  =|(t  +  1,  t)x(t)  +A(t  +  1,  t)u(t)      (1) 

The  output  equation  is 

£(t)  =  H(t)x(t)#  (2) 


The  observed  signal  is 

z(t)  =  j(t)  +  v(t)  =  H(t)x(t)  +  v(t)  #  (3) 

A  linear  dynamical  system  governed  by  an  ordinary  differ- 
ential equation  (continuous  time  case)  can  always  be  described 
in  the  standard  form. 

dx 

—  =  F(t)x  +  G(t)u(t)  (k) 

dt 

The  output  equation  is 

y_(t)  =  H(t)x(t)  §  (5) 

The  observed  signal  is 

z(t)  =  y(t)  +  v(t)  =H(t)x(t)  +v(t)<  (6) 

In  both  cases,  x  is  an  n-vector,  called  the  state.   The  co- 
ordinates x±   of  x  are  called  state  variables.   u(t)  is  an  m- 
vector,  called  the  control  function.   It  is  the  input  of  the 
system.   v(t)  is  a  p-vector.   It  is  an  additional  input.   It 
reflects  the  fact  that  physical  measurement  of  observables  can 
never  be  made  with  infinite  precision.   It  is  the  measurement 
noise.   v_(t)  is  the  output  of  the  system.   It  is  also  a  p-vector 
whose  components  are  linear  combinations  of  the  state  variables. 
_z(t)  is  the  observed  value  of  the  output  of  the  system.   It  is 
a  p-vector  too.   P(t)  is  an  n  x  n  matrix.   The  structure  of  this 
matrix  decides  the  nature  of  the  state  transition  matrix;  thus 
the  nature  of  all  solutions,  whether  forced  or  unforced,  depend 
upon  this  matrix.   G(t)  is  an  n  x  m  matrix.   It  is  a  coupling 


matrix,  as  the  structure  of  this  matrix  determines  how  the  input 
is  coupled  to  the  various  state  variables.   H(t)  is  a  p  x  n 
matrix.   It  is  also  a  coupling  matrix,  coupling  the  state  vari- 
ables to  the  output.   (j)(t)  is  a  transition  matrix.   Its  proper- 
ties will  be  discussed  later.  Mt)    is  an  n  x  m  matrix.   If  u(t) 
is  piecewise  constant,  then 

,t+l 
A(t  +  1,  t)  =  f    $(t  +  1,  i:)G(T)dT:  .  (7) 

't 

If  Q>    H>  Z1  8re  constant,  then  the  system  is  said  to  be 
stationary;  if  u(t)  =  0,  then  the  system  is  free. 

The  general  solution  of  equation  (1^.)  has  the  form 

rt 

x(t)  =  $(t,  t0)x(t0)  +  J        f(t,    T)G('r)u('u)dT  ,    (8) 

t0 
where  the  transition  matrix  is  characterized  by  the  properties 

(Ref.  10) 

d 

—  <jj(t,  t0)  =  F(t)$(t,  t0)  ,  (9) 

dt  ~ 

|(t0,  t0)  =  I       for  all  t0  ,  (10) 

inverse  rule 

|"  0<  =  |(t0,  tx)  for  all  t0,  tx  ,  (11) 
product  rule 

|(t2,  t0)  =  |(t2,  t1)|(t1,  t0)  .  (12) 

The  last  property  of  <j)  justifies  its  being  called  a 
transition  matrix.   Because  of  the  manner  in  which  (£  is  defined, 
this  matrix  can  never  be  singular. 


If  F  is   a   constant  matrix,  .then 
|(t,    t0)    =   exp  P(t    -    t0)    =    'Z 


(F(t  -  to)]1 

—     (13) 


i=0      i  J 

There  is  no  simple  way  to  compute  <j)  explicitly  when  P  is 
not  constant. 

The  first  term  of  equation  (8)  represents  the  initial  con- 
dition response  of  the  system  state  variables.  The  second  term 
represents  the  forced  response. 

If  the  u(t)  is  piecewise  constant,  then  any  linear  differ- 
ential equation  may  be  converted  into  a  linear  difference  equa- 
tion in  such  a  way  that  at  integer  values  of  time  the  solutions 
of  the  differential  equation  agree  with  the  solution  of  the 
difference  equation. 

Statement  of  the  Problem 

The  statistical  correlation  between  random  signals  observed 
over  some  interval  of  time  is  explained  by  the  presence  of  a 
dynamic  (linear  or  nonlinear)  system  between  the  primary  random 
source  and  the  observer.   Hence  random  processes  may  be  thought 
of  as  the  output  of  a  dynamic  system  (linear  or  nonlinear)  ex- 
cited by  an  independent  gaussian  random  process.   If  the  ob- 
served random  signal  z(t)  is  also  gaussian,  one  may  assume  that 
the  dynamic  system  which  is  between  the  observer  and  the  pri- 
mary source  u(t)  is  linear. 

Consider  a  linear  dynamical  system  subjected  to  random  dis- 
turbances and  measurement  noise.   Assume  that  the  state  x(t)  of 


the  system  cannot  be  observed  directly  but  only  through  the  out- 
put y_(t),  which  can  be  measured  only  in  the  presence  of  additive 
gaussian  white  noise  v(t).   In  addition,  the  system  is  subjected 
to  a  random  input  disturbance  in  the  form  of  gaussian  white 
noise  u(t).   Assume  that  physical  relationships  between  the  state 
x(t)  and  the  driving  noise  u(t)  and  between  the  observed  values 
_z(t)  and  the  state  x(t)  and  the  measurement  noise  v(t)  are  given. 
The  statistical  characteristics  of  the  driving  noise  u(t)  and 
measurement  noise  v(t)  are  also  assumed.   Then,  from  the  actually 
observed  values,  z(i) ,  over  some  interval  of  time,  t0  ^  t  ~  t, 
one  wants  to  find  the  optimal  (in  some  sense)  estimate  x^-,)  of 
x(t,)  at  time  t-,  .   In  the  case  where  t-, .  <  t,  the  problem  is  re- 
ferred to  as  a  data-smoothing  (interpolation)  problem;  if 
t-,  =  t,  it  is  called  the  filtering  problem;  and  if  t-,  >  t,  it 
is  called  the  prediction  (extrapolation)  problem.   Collectively, 
these  cases  are  referred  to  as  the  estimation  problem. 

Kalman  made  three  key  assumptions,  which  are:   (A-,)  The 
original  message  x(t)  is  assumed  to  be  a  random  process  gener- 
ated by  a  mathematical  model  of  the  type  described  by  either 
equation  (1)  or  equation  (ij.)  .   (A2)  The  observed  signal  _z(t)  is 
an  additive  combination  of  the  output  signal  and  a  white  noise 
described  either  by  equations  (3)  or  (6).   (Ao)  The  measurement 
of  _z(t)  starts  at  some  fixed  instant  tQ  (which  may  be  -  °^  ,  at 


which  time   cov 


x(tQ)  ,  x^q) 


i-s  known. 

From  these  three  assumptions,  there  are  two  cases  to  be  dis- 
cussed.  These  are  the  case  where  the  message  is  a  Gauss-Markov  Se- 
quence and  the  case  where  the  message  is  a  Gauss-Markov  Process. 
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The  message  is  a  Gauss-Markov  Sequence  generated  by  the 
recursive  relation 

x(t  +  1)  =  <jj(t  +  1,  t)x(t)  +  A(t  +  1,  t)u(t)  t  (1) 

The  observed  signal  (which  is  message  plus  noise)  is 

z(t)  =  y_(t)  +  v(t)  =  H(t)x(t)  +  v(t)  m  (3) 

u(t)  is  a  gaussian  white  noise  sequence;  u(t)  evaluated  at 
different  times  is  independent.   Hence 


cov 


u(t1),  u(t2)]  =  0 


if  tx  /  t2  . 


(11;) 


u(t)  has  zero  mean,  that  is, 


E  [u(t)j  =  0 


Then  the  covariance  matrix  is 


for  all  t 


(15) 


cov 


u(t),  u(t) 


Q(t)    for  all  t. 


(16) 


x(t)  is  a  gaussian  random  sequence  with  zero  mean  and  arbi- 
trary covariance,  independent  of  u(t).  x(t)    satisfies  the  Mar- 
kov property,  that  is,  the  conditional  probability  distribution 
of  x(t)  ,  given  x(i) ,    t^   ^  i    <t,  is  identical  with  the  proba- 
bility distribution  of  x(t)  given  the  last  observation  x(t  -  1) 
(Ref.  6). 

Pr  fx(t)  <  Z^  (  x(t  -  2)  .  .  .  x(tQ)j 

=  Pr(x(t)  <  Z1   |x(t  -  1))  (17) 

Hence  x(t)  is  a  Gauss-Markov  sequence. 


" 


if  t]_  j-   t2 

(18) 

for   all    t 

(19) 

(20) 

for   all   t 

(21) 

•  _z(t)  is  also  a  gaussian  random  sequence,  because  gaussian 
random  signals  remain  gaussian  after  passing  through  a  linear 
system. 

v(t)  is  also  a  gaussian  white  noise  with  zero  mean  and  co- 
variance  R(t),  independent  of  u(t). 

covpvU-j^) ,  v(  t2)J  =  0 

E  {v(t)|  =  0 

cov[v(t),  v(t)]  =  R(t) 

cov|~u(t)  ,  v(t)  J  =  0 

Thus  a  random  sequence  may  be  thought  of  as  the  output  of 
a  dynamical  system  excited  by  two  independent  gaussian  random 
sequences. 

A  Gauss-Markov  Process  is  the  limiting  case  of  Gauss-Markov 

sequence  when  the  interval  between  successive  values  of  time 

tends  to  zero.   Then  equations  (l)  and  (3)  convert  to  the  con- 
tinuous time  case  as  follows: 

dx 

—  =  F(t)x(t)  +  G(t)u(t)  (M 

dt  ~  ^' 

z(t)  =  y_(t)  +  v(t)  =  H(t)x(t)  +  v(t)  (6) 

The  random  processes  u  and  v  are  defined  in  such  a  way  that  at 
integer  values  of  time  the  random  processes  x  and  z  generated  by 
equations  (i|)  and  (6)  agree  with  those  generated  by  equations 
(1)  end  (2).   Hence  the  sample  functions  are  assumed  to  be  piece- 
wise  constant  over  intervals  of  length  1. 
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u(T)    =  u(T   +  t) 
v(T)    =  v(T  +  t) 
where  T   is   an  Integer  and 

0  £  T  «  1. 

It  is  assumed  that 


E  {u(t)J  =  0 
E  (v(t)}  =  0 


E 


(u(t),  u(t))  =  Q(t)6(t  -  t) 
ov[v(t),  v(t)]  =  R(t)6(t  -  t) 


and 


oov  v(t),  u(t) 


=  0 


for  all  t,  i 


(22) 
(23) 


6(t)  is  the  Dirac-delta  function,  which  has  the  following 
characteristic  properties  (Ref .  3) : 


a  r  1 

6(t  -  t0)  =  —  l(t  -  t0) 

dt  L         -' 

where  l(t)  is  a  unit  step  function,  and 

J    6(t  -  t0)dt  =     6(t  -  tQ)dt  =  1/2 

'-o©  'tQ 


(2k) 


(25) 


and 


■CO 

(     f(t)6(t  -  t0)dt  = 


f(t)6(t  -  t0)dt  = 


f(t)6(t0 


1 

"  f(t0) 

2 


t)dt  =  f(t)     (26) 


(27) 


if  t-j_  =  tQ  ,  or   tg  =  tQ 


12 


The  values  of  the  sample  functions  of  u  and  v  are  to  be 
regarded  as  Dirac-delta  functions  of  vanishing  small  areas. 
Mathematically  speaking,  this  definition  of  course  is  not  rigor- 
ous, since  6(t)  is  not  a  well  defined  function. 

The  block  diagram  of  this  system  is  shown  in  Fig.  2. 

The  estimation  problem  is  formulated  as  follows.   Given  the 
actually  observed  value  of  a  random  process  _z(t)  over  some  in- 
terval of  time,  tQ  ^  t  ^  t,  find  the  optimal  estimate,  x(t^), 
of  another  related  random  process  x(t,  )  such  that  the  estimate 
x(t-i)  minimizes  the  expected  losses 

E  |e  (L(x(t1)  -  $(t1))|z(T),  tQ£  T  «  tjj  (28) 

or  equivalently 

E|L(x(t1)  -  x(t1))|z(T),  tQ  <:   t  <  t  )   .  (29) 

The  loss  function  L(£)  is  a  scalar  valued,  positive,  non- 
decreasing  function  of  the  estimation  error 

5  =  2(ti)  '  ^(tl}  ' 
The  loss  function  L(£)  must  satisfy 

L(0)  =  0 

L(£1)  ^  L(§2)  ^  0 


where    £, 


^  II §2  1^  °  • 


Let  x  be  an  n-dimensional  random  vector  with  mean  5  an<^  distri- 
bution function  F(  5  ) .   Suppose  F( 3  )  has  the  following  proper- 
ties.  F  is  symmetrical  about  its  mean  %  . 
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F(J  -  3)  =  1  -  P(l  -  I) 

convex  for     5.    ^  ^. 

P(\  Jn  +   (1  -  X)512)  *   XP(  J  n)    +   (1  -   \)P(  5i2) 

for  all  ^  lx,  ?i2  <^  31    and   0  ^  \  ^   1  . 

Then  the  estimate  x(t-i)  which  minimizes  the  expected  loss 
is  the  conditional  expectation 

x(t1)  =  x(t1|t)  =  E  f  x(tx)|  z(t),   t0^  t  £  t?     (30) 

If  the  loss  function  is  defined  by  the  mean  squared  error, 
i.e., 

L(£)  =  (i,  €)  =  ([x  -  |],  [x  -  x])  =  r  (Xi  -  Xi)2 

then  the  optimal  estimate  is  also  the  conditional  expectation, 

x(tx)  =  x(ta|t)  =  Efx(t1)  z(t)   t0<^  t  ^  t) 

without  imposing  the  restrictions  that  the  distribution  function 
F(  5  )  be  both  symmetric  and  convex  (Ref .  8) . 

The  conditional  mean,  which  supplies  the  minimum  expected 
loss  for  many  loss  functions,  plays  an  important  role  in  the 
filtering  problem.   Of  course,  if  the  conditional  distribution 
is  known,  the  optimal  estimate  x(t)  can  be  computed  for  any 
loss  function. 

A  g8ussian  distribution  is  both  convex  and  symmetric  about 
the  mean.   It  is  obvious  that  the  optimal  estimate  is  always  the 
conditional  expectation  £(1^  t) .   The  conditional  probability 
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distribution  of  a  gaussian  process  is  completely  described  by 
its  mean  and  covariance.   If  in  addition  the  process  is  also 
Markovian,  then  it  suffices  to  know  the  mean  and  covariance  at 
one  instant  of  time.   The  calculation  of  the  optimal  estimate 
involves  only  the  means  and  covariance  matrices  of  the  gaussian 
process.   Thus  x(t^)=  xCt^lt)  is  clearly  the  optimal  estimate 
for  the  class  of  all  of  the  random  processes  with  the  same  means 
and  covariance  matrices  as  the  gaussian  process. 

In  the  gaussian  process,  the  optimal  estimate  x(t,)-  of 
x(  t,  )  is 

x(tx)  =  iUilt)  =  EJx(t1)|  z(t),  t0  £  T  <  t)       (31) 

where   x(t-ift)  is  an  unbiased  estimate  of  x(t]_).   That  is, 

EfxU-Llt)}  =  Efx(t]_)J  =  0. 
Define 

x(t1|t)  =  x(t1)  -  x(ta  t)  (32) 


where  x(t-,  t)  is  the  error  between  the  actual  value  x(  t-,)  an^ 
its  conditional  expectation.   The  conditional  expectation  of 
gaussian  random  process  x(t)is  identical  with  the  orthogonal 
projection  of  x(  t-,  )  upon  the  sample  space  _z(T),t0  ^  t  ^  t 
(Ref.  8).   Hence  the  optimal  estimate  x(t-,[t)  of  x(t-j)  Is  ^he 
orthogonal  projection  x(t-i)  on  a  linear  manifold  or  vector  space, 
generated  by  z  (t)  ,  tQ  •£  t  5  t.   x(t,  t)  will  minimize  the  ex- 
pected loss, 

E-[L(x(t)  -x(t|t))|  =e["Z  [x±(t)  -  xi(t|t)]2j 


16 


That  will  be  discussed  in  more  detail  in  the  continuous  case. 

By  gaussianness,  x(t]_|t)  and  xtt^Jt)  are  independent  random 
variables.   The  covariance  matrix  of  x(t-j_|t)  is  defined  by 

PU-Jt)  =  cov[x(t1|t),  xCt^t)]  (33) 

The  quantities  v(t|t),  v(tjt),  u(tjt)  .  .  .  z(t|t)  are 
defined  similarly. 

Solution  of  the  Problem 

In  a  Gauss-Markov  sequence  the  message  is  a  random  sequence 
generated  by  the  equation 

x(t  +  1)  =  |(t  +  1,  t)x(t)  +  A(t  +  1,  t)u(t)  . 

Repeated  use  of  the  above  equation  yields 

x(t  +2)  =  |(t  +  2,  t  +  l)x(t  +1)  +  A(t  +  2,  t  +  l)u(t  +  1), 

x(t  +  3)  -  |(t  +  3,  t  +  2)x(t  +  2)  +  Mt   +   3,    t  +  2)u(t  +  2) 
=  |(t  +  3,  t  +  l)x(t  +  1) 

+  |(t  +  3,    t  +  2)A(t  +  2,  t  +  l)u(t  +  1) 
+  |(t  +  3,  t  +  3)A(t  +  3,  t  +  2)u(t  +  2)  , 
x(t  +  k)   =  |(t  +  k,    t  +  3)x(t  +  3)  +  4(t  +  l+,  t  +  3)u(t  +  3) 
=  |(t  +  i;,  t  +  l)x(t  +  1) 

+  Mt  +  k,    t  +  2)a(t  +  2,  t  +  l)u(t  +  1) 
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+  |(t  +  k,    t   +   3)A(t   +   3,  t   +    2)u(t   +    2) 

+  |(t  +  k,   t  +  i|.)A(t  +  I4.,  t  +  3)u(t  +  3) 

1  (t+l^-l 

=  $(t  +  1^,    t  +  l)x(t  +  1)    +  T~ 


TT=t+l 


|(t  +  I4.,   ^  +   1)A(t1  +  1,    t1)u(t1)    . 


Hence   by   induction 


x(tx)    =  |(tx,    t  +  l)x(t  +  1) 

T-]_=t+l 

for   t-j_  ^    t   +   2 


tJS1      T 

II  $(t!,     T1    +    1)4(T1    +    1,     T1)u(T1) 


(3k) 


Taking  conditional  expectation  of  both  sides  with  respect  to 
z(t),  tQ  <   t  <r  t  ,  yields 

E|x(t1)  |z(t),    t0  <:  t  ^  t J  =  E^t-^    t  +   l)x(t  +   1) 

tl"1     "~  % 

+  Z  l(tl,     ^1    +     D^(T1    +    l,T1)U(T1)      Z(T),     tQ<T,<t 

T1  =  t+1    ~  "~  U  ) 

since  u(t)  is  independent  of  z(t)  ,  tQ  «  t  <•  t,  and  eTuCt-,)]  =  0. 

x(tx|t)  =  |(tx,  t  +  l)x(t  +  1  t)  for  tx>  t  +  1  .  .  .  (vd). 


Hence 


This  equation  applies  to  the  optimal  prediction  (extrapo- 
lation) problem..  So  far,  no  similarly  simple  formula  is  known 
for  data  smoothing  (interpolation)  case,  for  t-,  <^  t. 

Supposing  x(t[t  -  1)  is  known.   x(t  +  l|t)  can  be  computed 
by  induction. 

Consider  the  linear  manifold  M(t)  generated  by  the 
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observation  z(t0),  .  .  . ,  z(t).   This  manifold  can  be  decomposed 
into  two  parts.   One  part  is  the  space  M(t  -  1)  generated  by 
z(t0),  z(t0  +  1)  .  .  .  z(t  -  1),  while  the  second  part  is  the 
space  N(t)  generated  by  ^(t|t  -  1),  the  component  of  z(t)  that 
is  orthogonal  to  M(t  -  1). 


z(t|t  -  1)  =  z(t)  -  H(t)x(t|t  -  1)  =  H(t)x(t|t  -  1)  +  v(t) 


z 


Two  linear  manifolds  M(t  -  "l)  and  N(t)  are  said  to  be 
orthogonal,  if  every  vector  in  one  manifold  is  orthogonal  to 
every  vector  in  the  other  manifold.   The  two  sets  of  gaussian 
random  variables  are  independent;  hence  the  conditional  expecta- 
tions may  be  computed  separately. 

E(x(t  +  l)JM(t)]  =  Efx(t  +  1)  |M(t  -  1)J  +  Efx(t  +  l)|N(t)} 

Ejx(t  +  1)1  z(t),  t0  ^  -u  ^  t   =  E^xU  +  1)|  z(t),  t0<  T  £  t  -  lj 

+  EJxU  +  1)  z(tft  -  1)\ 


Then 


$(t  +   l|t)  =  $(t  +  1,  t)x(t|t  -  1) 

+  A(t  +  1,    t)Efu(t)|M(t  -  1)} 
+  E/x(t  +  1)|  jz(t|t  -  1)  j  . 
Since  E(u(t)[M(t  -  1)|  =  0  , 

x(t  +  l|t)  =  $(t  +  1,  t)x(t|t  -  1)  +  E^x(t  +  l)|z(t[t  -  1)}  . 


(35) 


By  gaussianness  and  knowing  that  (Ref.  9) 


• 
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E^x(t  +   l)|l(t|t   -    1)}    =  EJx(t  +   1)} 

+   cov[x(t  +   1),  1s(t|t   -   l)~|covFz(t|t   -   1),   f(t|t   -   1)J 

r 

•  rz(t|t  -  D  -  Efz(t|t  -  1)1 


-l 


one   can  write 


Efx(t  +   l)||(t|t   -   1)1    =  covjx(t  +   1),    ^(t|t   -   1)1 


r  ~i-i 

cov rz(t   t   -    1)  ,  'zU   t   -    1)  I 


•  riuit  -  d] 


'j 


(36) 


since 

EJx(t   +   1) 
and 


Efz(t[t    -   1)]    =  E"/H(t)x(t[t   -   1)    +   v(t)?    =   0    . 

The   first   factor   on   the   right-hand   side   of   equation    (3&)    can 
be  written 


cov 


jx(t   +   l),  ^z(t|t   -   1)1    =   cov["$(t   +   1,    t)x(t) 
+  A(t  +   1,    t)u(t),    H(t)x(t|t   -    1)    +  v(t)J 
("$(t  +   1,    t)x(t),    H(t)x(tlt   -    1) 


=   cov 


L^ 


Ti 


+   cov     (j)(t  +   1,    t)x(t),    v(t) 

+   cov|£(t  +   1,    t)u(t),    H(t)x(t|t   -    1) 

+   cov|/>(t  +   1,    t)u(t),    v(t) 


which  reduces  to 
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cov[~x(t   +   1),    zH|t   -    1)J 

=   cov  [|(t  +   1,    t)x(t),    H(t)x(t|t   -    1)] 
=  E  (|(t   +-1,    t)x(t)x'(t|t   -    l)H»(t)j 
=  |(t   +   1,    t)E/x(t)x'(t|t   -    l))H'(t) 


since 


covjx(t),    v(t)J   =   covf"u(t),    x(t[t   -   1)]    =  cov[u(t),    v(tf[   =  0 


and    since 


EJx(t)\    =  E/x(t|t    -    1)1    =0. 

Furthermore,    in  view   of   the   fact   that 
x(t)    =  x(t|t   -    1)    +  x(t|t   -   1) 

and   the   fact    that 

Efx(t|t   -   l)x»(t|t   -    1)]    =  0    , 

the  first  factor  of  the  equation  can  be  written 
covfx(t  +  1),  z(t[t  -  1)1 

=  |(t  +  1,  t)EJf(tjt  -  l)x'(t|t  -  l)JH'(t) 
=  $(t  +  1,  t)P(t|t  -  l)H'(t)  . 


(37) 

The  second  factor  on  the  right-hand  side  of  equation  (36)  can 
be  reduced  in  a  similar  manner  to 
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cov 


[V(t|t  -  i),  |(t|t  -  d] 

=  cov[*H(t)x(tJt  -  1),  H(t)x(t|t  -  1)J 

+  cov  H(t)x(t|t  -  1)  ,  v(t)J 
•cov  v(t),  H(t)x(t[t  -  1)J  +  cov  v(t),  v(t)j 
=  H(t)cov[x(t[t  -  1),  x(t|t  -  l)lH'(t)  +  covj^(t),  v(t) 
=  H(t)P(t|t  -  l)H'(t)  +  R(t)  .  (38) 

Substituting  equations  (37)  and  (38)  into  equation  (36)  yields 
EJxU  +  l)fz(t|t  -  l)j 

=  [|(t  +  1,  t)P(t|t  -  l)H'(t)] 
•[H(t)P(tft  -  l)H'(t)  +  R(t)]~ 

.[z(t)  -  H(t)x(t|t  -  1)]  (39) 

Combining  equations  (35)  and  (39)  yields 
x(t  +  l|t)  =  J(t  +  1,  t)x(t  t  -  1) 

|(t  +  1,    t)P(t|t  -  l)H'(t)| 
H(t)P(t|t  -  l)H'(t)  +  R(t) I"1 
z(t)  -  H(t)x(t|t  -  1)1 

=  *£{t   +   1,  t)x(t|t  -  1)  +  K(t)z(t)  (Id) 

where 

V(t   +   1,  t)  =  |(t  +  1,  t)  -  K(t)H(t) 
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and 


K(t)  =  [*$(t  +  1,  t)P(t|t  -  l)H'(t)] 

L—  ~   J 

H(t)P(t|t  -  l)H'(t)  +  R(t)]"1 
From  equation  (32)  one  can  obtain  the  equation 
x(t  +  1  t)  =  x(t  +  1)  -  x(t  +  lit)   . 


(II Id) 


(U-0) 


Substituting  equations  (1)  and  (Id)  into  equation  (1+0)  yields 
x(t  +  l|t)  =  .|(t  +  1,  t)x(t)  +  A(t  +  1),  t)u(t) 
-  y(t  +  1,  t)x(t|t  -  1)  -  K(t)z(t) 


=  V(t  +  1,  t)x(t|t  -  1) 

+  Mt  +   1,    t)u(t)"  -  K(t)v(t)   . 


(lid) 


The  solution  of  the  filtering  problem  can  be  computed  by 
deriving  a  recursion  relation  for  the  conditional  covariance 
matrix  P(tjt  -  1),  which  is  the  only  remaining  unknown  in  equa- 
tions (Id)  and  (Hid)  . 


P(t  +  ljt)  =  cov[x(t  +  l|t),  x(t  +  l|t)j 


=  cov 


£(t  +  1,  t)x(t|t  -  1) 
+  Mt   +   1,  t)u(t)  -  K(t)v(t)  , 
£(t  +  1,  t)x(t|t  -  1)  +  A(t  +  1,  t)u(t) 

-  K(t)v(t) 


covmt  +  1,  t)x(t  |t  -  1) , 
f(t  +  1,  t)x(t|t  -  1)] 
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+  cov[k(t)v(t),  K(t)v(t)J 

+  cov[^(t  +  1,  t)u(t),  A(t  +  1,  t)u(t)] 

=  £(t  +  1,  t)P(t  t  -  l)V'(t  +  1,  t) 

+  K(t)R(t)K'(t)  +  Mt   +   1,  t)Q(t)A'(t  +  1,  t)  (1+1) 

The  first  two  terms  of  equation  (1+1)    are 

V(t  +  1,  t)P(t|t  -  l)Y'(t  +  1,  t)  +  K(t)R(t)K'(t) 

=  [|(t  +  1,  t)  -  K(t)H(t)]p(t|t  -  1) 

[j>'(t  +  1,  t)  -  H'(t)K'(t)J+  K(t)R(t)K'(t) 

=  |(t  +  1,  t)P(t|t  -  l)|'(t  +  1,  t) 
-K(t)H(t)P(t|t  +  l)|'(t  +  1,  t) 
-  |(t  +  1,  t)P(t|t  -  l)H'(t)K'(t) 
+  K(t)H(t)P(t|t  +  l)H'(t)K'(t)  +  K(t)R(t)K'(t) 


=  |(t  +  1,  t)P(t|t  -  l)|'(t  +  1,  t) 

-  K(t)H(t)P(t[t  -  l)|'(t  +  1,  t) 

-  |(t  +  1,  t)P(t|t  -  l)H'(t)K'(t) 

+  K(t)[H(t)P(t|t  -  l)H'(t)  +  R(t)JK'(t)  (1+2) 

Substituting  equation  (Hid)  into  the  last  two  terms  of 
equation  (1+2) , 

K(t)[H(t)P(t|t  -  l)H'(t)  +  R(t)]K'(t) 

-  |(t  +  1,  t)P(t|t  -  l)H'(t)K'(t) 

=  [|(t  +  1,  t)P(t|t  -  l)H»(t)][H(t)P(t|t  -  l)H'(t)  +  R(t)]"1 
[H(t)P(t|t  -  l)H'(t)  +  R(tj|K'(t) 
-  l(t  +  1,  t)P(t|t  -  l)H'(t)K'(t) 
=  0  . 


2h 


Then   the   first    two   terms   of   equation    (lj.1)    are 

t 

iit   +   1,    t)P(t|t    -    l)Y'(t   +   1,    t)    +  K(t)R(t)K'(t) 

=  §(t   +    1,    t)P(t|t   +   l)|'(t   +   1,    t) 

-  K(t)H(t)P(t|t    -    l)ijj'(t   +    1,    t) 

=  $(t   +    1,    t)/p(t|t    -    1) 

-  [P(t|t    -    l)H'(t)]  [H(t)P(t[t    -    l)H'(t)    +   R(t) 
•  H(t)P(t|t    -    1)]  •   J'(t   +    1,    t) 

Hence  equation  (I4.I)  becomes 

P(t  +  ljt) 

=  |(t  +  1,  t)fp(t|t  -  1)  -  [P(t|t  -  l)H'(t)] 
.  [H(t)P(t|t  -  l)H'(t)  +  R(t)]-1  H(t)P(t|t  -  1)\ 
•  f'(t  +  1,  t)  +  A(t  +  1,  t)£(t)A'(t  +  1,  t) 


--1 


(k3) 


(IVd) 


Equations  (Id)  to  (Vd)  are  the  solutions  for  a  Gauss- 
Markov  sequence.   They  will  be  discussed  later. 

In  a  Gauss-Markov  process  the  message  is  a  random  process 
generated  by  the  equation 

dx 

—  =  F(t)x  +  G(t)u(t) 

dt   "   ~   "   ~ 

and  the  observed  value  is 


z(t)  =  y_(t)  +  v(t)  =  H(t)x(t)  +  v(t)  < 

The  solution  of  this  problem  can  be  directly  derived  from 
the  Wiener-Hopf  equation.   Pugachev  pointed  out  that  the  Wiener- 
Hopf  equation  is  nothing  more  than  a  special  case  of  the 
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orthogonal-projection  theorem  (Ref.  20). 

The  orthogonal-projection  lemma  can  be  stated  as  follows 
A  necessary  and  sufficient  condition  for 


is  that 


X  -  W   ^   X  -  Wq 


(*  "  ™0>  —^    ~   ° 


for  all  w  in  W 


for  all  w  in  W  . 


v 


(kh) 


ik-S) 


Moreover,  if  there  is  another  vector  w-,  satisfying 


then 


(x  -  w-j_,  w)  =  0 


[Wq    -    W     ||=    0 


where   x  is  an  element  of  abstract  space  % 
W  is  a  subspace  of  *]£- 
w  is  an  element  of  W. 


Proof: 


2S  "  ™[j'  =  2E  "  HqII   +  z^x 


2  "  ™o>  -0  "  H)  +  || 2£  ~  ^0||2    (^ 


Since 


and 


(*  "  ™o>  Ho  "  — ^  =  ^ 


w  -  w^ 


it  follows  that 


5  0 


5  -  £     £ 


x  -   w. 


Substituting  w  =  w^   into   equation    (I4.6)  , 


2  "  £1 


*  ~  Holl  !  + 


— 1   "  — 0 


(U-7) 
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Since  w,  satisfies  condition  (\\$) 

2 


x  -  w, 


2   l|       I 


+  2(x  -  wn  ,  W-,  -  wn)  + 


X  -  w,    + 


£        zLl>    SL2.    "  — 0' 


W/-\  -  W-. 


So  "  S: 


(1+8) 


Combining  equations  (lj_7)  and  (lj.8)  yields 


x  -  w, 


0 


2  -I! 


=   X  -  Wr 


Si  "  SO    + 


so  -   Si 


hence 


Sq  "  Si '  =  °  • 


Consider  a  vector  Wp  such  that 


(x  -  wn,  w?)  =  ^  ^  0  . 


Then 


x  -  wQ  -  Pw2   =  I  x  "  Sq 


-  2-<(3  +  j3' 


S2 


where  -<  is  given,  but  the  value  could  vary  with  (3.   For  suitable 
choice  of  [3,  the  last  two  terms  could  be  negative;  then 

.2  .  ,       „2 


x  -  w 


0 


PS- 


2  ~  Sol!   ' 


This  inequality  contradicts  the  optimality  of  wQ . 

The  optimal  estimation  problem  for  the  multidimensional 
situation  is  one  in  which  the  values  of  z{i) ,    tQ  ^  T^t  are 
given,  and  one  wishes  to  find  an  estimate  x(t-|_  t)  of  x(t-,)  hav- 
ing the  form 


vt 

X(tx|t)  =  J  A(tx,  T)z(t)dT 


W) 


which  minimizes  the  expected  squared  error  sum, 
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s 


(i  kit)  -^(t)]2J. 


A(t^,  t)  is  an  n  x  p  matrix  whose  elements  are  continuously 
differentiable  in  both  arguments.   x(ti   t)  is  an  unbiased  min- 
imum variance  linear  estimate  of  x(  t-,  )  .   That  is, 


x, 


E  \   x(t.)/  =E   $(t 


_^  ul 


t)   =  0 


A  necessary  and  sufficient  condition  that  x(t,  t)  be  a  min- 
imum variance  estimator  for  x(t-.)  is  that  the  matrix  function 
A(t1,  t)  satisfy  the  matrix  form- of  the  Wiener-Hopf  equation 
(Ref.  17). 

X 


cov 


[x(tx),  z(cr)J  -  J    A(tx,  t)cov[z(t),  z((T)dT  =  0 

^0  J 


(50) 


for  tQ  s?  <T   <  t  . 


Since 
cov 


hit]),    z(cr)]  =  cov["$(t1|t)  +  x(t-L  ]  t),  z(cr)] 

=  cov[x(t1  |t),  z(<r)J  +  covjfCt-L  |  t),  z(cr)] 

t 

j  J    A(tx,  T)z(T)z»(cr)dT:j  +  cov^^  |  t)  ,  z(cr)J 
I  tn  '   . 


=  E 


and  assuming  that  the  interchange  of  expectation  and  integration 
is  a  valid  operation,  yields 


covfxUjJ  ,  z{(T)\ 


Ait,,    t)  E  f  z(t)z'(<7-)']  &i 

t0         l       ; 

[x(t2  |t),  z(<r)] 


+  cov 
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=     j       A(tx,    t)covrz(T)  ,    z(<r)J  dTT 

+   cov  x(t1   I  t)  ,    z(cr)J 


Hence   the  matrix  form  of   the  Wiener-Hopf   equation   reduces    to 
cov^t-L  |  t),    z((T)J    =0  for   t0^CT  <  t  (£l) 


This    is   equivalent   to    saying 

,t 


x(t1   |  t)    =    j        £(*!*    T:)z(T)dTJ 


^0 
will  be  optimal  estimator  for  x(t-j_)    if  and  only  if  A(t-,,  t) 

satisfies  {SD  • 

Proof: 

Let  *£  denote  the  space  generated  by  the  random  vectors 
x(t-,).   The  subspace  W  is  generated  by 

■  t 


w(tx) 


=  J       B(tx,  t)z 
Jt0 


(T)dT 


(52) 


where  B(t,  t)  is  an  m  x  p  matrix  whose  elements  are  continuously 
diff erentiable  in  both  arguments.   Let 


wQ  =  x(t-j_  j  t) 
The  orthogonal-projection  lemma  implies  that 
(x  -  Wq,  w)  =  0 
(X  -  Wq,  w)   =  Ei 

=  E 


r^s  -  so 


I  [x(t2)  -  x(tx  Itjjw'd^)] 
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=  E  jj   ^(t1  [  t)z'(<r)B'(t,  cr)d'cr) 
=    J     E/x(t1  |  t)z'(<r)j  B'(t1,cr)dcr 


0 

t 

^0 


cov 


x(tx 


t),  z(<r) 


B'  (tlf  a-)dcr 


=  o 


Then   cov^x(t1   t)  ,  z(<r)J  =0   establishes  the  sufficiency  of 
the  condition,  because  B  is  an  arbitrarily  selected  matrix  sub- 
ject only  to  the  differentiability  conditions.   Let 

B(t1,  er)  =  covfxU-L  |  t),  zfcr)]  , 

then  the  matrix  B  B'  is  nonnegative  definite.   Then  the  in- 
tegral is  zero  only  if  B(t,<T)  vanishes  identically  for  all 
t()  ~  °~  <  t  •   Therefore  the  necessity  condition  is  verified. 

A  canonical  form  for  the  optimal  filter  can  be  derived  from 
the  Wiener-Hopf  equation.   Let  t±   =   t,  then  the  Wiener-Hopf 
equation  becomes 


cov 


fx(t),  z(cr)l  =  f     Ait,    t)  covfzCT),  z(<r)ldT 

J      h0  L-        -     J 


(S3) 


Differentiating  both  sides  with  respect  to  t  and  interchanging 
the  order  of  operations  of  differentiation  and  expectation, 
then  the  left-hand  side  is 

2  r  -]  r   3      r 

-   cov  x(t),  z(<T)   =  E 

2t        L-    -J    [d 


—  covfx(t),  z((T)l  =  e(  — .    rx(t)z'(<T)l 
dt         L        J     I  5t  L~   ~   J 


=  Ej  [P(t)x(t)  +  G(t)u(t) 


«n 


. 
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=  F(t)E/x(t)  .z«(<r)]  +  G(t)E/u(t)z'(cr)} 

By  assumption,  u(t)  is  independent  of  v(er)  and  x(o~)    when 
<T  <t.      Hence 

E  {u(t)z'(<0}  =  0_  . 


The  matrix  form  of  the  Wiener-Hopf  equation  yields 

■t 

% 

Hence 


l[x(t)z'(<r)J  =  f     A(t,  t)cov[z(t),  z  (<r)]  dT  . 


cov 


dt 


x(t),  z((T) 

t 


"I 


=  J      F(t)A(t,  t)covz(t),  z(<r)]dT 
t0 


(5k) 


The  derivative  of  the  right  side  of  equation  (53)    becomes 

■   0       ft  r  -, 

—   I   A(t,  t)cov  z(t),  z(ar) 
at    k0  l- 


dT 


since 
cov 


=  /   —  A(t,  t)covJz(t),  z(CT)  dT 
+  A(t,  t)cov[z(t)  ;  z(<r)J 

[z(t),  z(cr)] 

=  cov[H(t)x(t)  +  v(t),  z(o-)] 

=  H(t)covj~x(t),  z(<r)]  +  cov[v(t),  zfcT)] 

=  f  H(t)A(t,  t)cov[z(t),  z(<T)]  dT  . 


(55) 


(56) 


Combining  equations  (55)    and  (5°),  the  derivative  of  the  right 
side  of  equation  {53)    is 
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-     [       A(t,    tOcovFzCt;)  ,    z(C~)  1  dt 
t     /to  L 


3      /t 

3   0   /  Uq 


t  p^ 

0 


■  A(t,  t)  +  A(t,  t)H(t)A(t,  t) 
tn  U?t 


cov 


z(t),  z((T) 


dT 


(57) 


Combining  the  results  of  equation  (%i\)    and  equation  (57)  yields 

A    r  d 

F(t)A(t,  t)  A(t,  t) 

>t0  L  dt 

-  A(t,  t)H(t)A(t,  t)"J  covTzCt),  z((T)]dT  =  0 


for  tQ  ^r  <r  <   t 


(58) 


If  the  optimal  matrix  A  is  a  solution  of  the  differential 
equation, 

d 

F(t)A(t,  t) A(t,  t)  -  A(t,  t)H(t)A(t,  t)  =  _0 

dt 

f or  t0  <   t  ^  t  (59) 

then  equation  (58)  is  certainly  satisfied. 

If  R(t)  is  positive  definite  in  the  interval  tQ  <£  T  ^  t, 
then  the  covariance  matrix  cov[__z(t),  .z^JJ  is  also  positive 
definite  in  this  interval   tQ  &   t  <  t  .   Then  the  condition 
(59)  is  necessary. 

The  optimal  filter  is  generated  by  differentiating 

A, 
x(t  ]  t)  =   j    A(t,  T)z(T)dT 
/t0 

with  respect  to  t  and  combining  the  result  with  equation  (59) . 

Thus 
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dx(t|t)    d       ft 

=  —     A(t,  t)z(t)c3t 

dt      dt     't0 


t    2 


to  dt 


<t  r- 


A(t,  T)z(T)dT  +  A(t,   t)z(t) 


=  J       [p(t)A(t,  t)  -  A(t,  t)H(t)A(t,  Tr)1z(T)dT 

+  A(t,  t)z(t)  . 


Let  A(t,  t)  =  K(t),  then 


dx(tlt) 


dt 


,t    f 
=     /    |Z(t)A(t,  T)  -  K(t)H(t)A(t,  T)  z(T)]dT 

+  K(t)z(t) 


=  fp(t)  -  K(t)H(t)l  f     A(t,  T)z(T)dT  +  K(t)z(t) 

=  P(t)  -  K(t)H(t)]^(t  |  t)  +  K(t)z(t) 

=  F(t)£(t  |  t)  +  K(t)[z(t)  -  H(t)x(t  |  t)]        (Ic) 

Substituting  x(t  |  t)  =  x(t)  -  x(t  |  t)  into  equation  (Ic) 

d 

—  Tx(t)  -  f(t  |  t)J  =  [P(t)  -  K(t)H(t)]  [x(t) 

-  x(t  |  t)]  +  K(t)z(t) 

dx(t)    dx(t  |  t) 

— =  Z(t)  -  K(t)H(t)|  x(t)  -  x(t  !  t)| 

dx        dx      L  ~   J  L~      ~   '   -1 

+  K(t)[H(t)x(t)  +  v(t)J 


Then 
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dx(t!t) 


dt 


=  [fU)  -  K(t)H(t)~]x(t  |  t)  +  G(t)u(t)  -  K(t)v(t)  (He) 


The  next  step  is  to  derive  an  explicit  form  of  the  optimal 
gain  (or  optimal  weighting  matrix)  K(t).   It  is  obtained  from 
the  Wiener-Hopf  equation 

Tx(t),  z(CT)  I  -  )   A(t,  t)cov 

/t0  ' 

-t 

to 


covl 


z(t)  ,  z(CT)  dT  =  0 


)1 


r[x(t),  z(cr)J  -  j      A(t,  T)covrz(t)-,  z(cr)[  dT 

=  covFx(t),  jicr)    +   v(cT)J  -  J      A(t,  t)cov|j(t)  +  v(t), 

^O  "1 

V_(CT)   +  v(<T)J  dT 


=  cov[x(t),  y_(cr)J  -  y  A(t,  t)  Jcov[v_(t),  Z(<T)]. 

+  R(t)6(t  -  0~))  dT 

_  4- 

=  covhc(t),  £(CT)J  -  J      Mt,  t)covRt(t),  y_(crf|dT 

-  A(t,<r)R(CT) 


=  0 

t0^  <r  <  t 


Hence 


cov 


[x(t),  Z((T)]  -  j      A(t,  t)cov[j(t),  l((T)]dT 


=  A(t,  o-)r(o-) 


(6o) 


Both  sides  of  equation  (60)  are  continuous  functions  of  ^  . 
Therefore  that  equality  holds  also  at  <T   =  t  .   Hence 


3k 


A(t,     t)R(t)    =    cov[x(t   |  t)    +   x(t    ft),    V_(tf] 

-  y       A(t,    t)oovJj(t),    V_(t)]dT 
/to 

~x(t  [t),    y_(t)]    +   cov[x(t   |  t),    y_(t)J 

-  f     A(t,    -u)covfi(T),    yJtfjdT 
'tn 


=   cov 


since 


COvfx(t|t),     y_(t)J     =    E{/       A(t,     T)z(T)Z'(t)dT 


=     (     A(t,   tJeJzKJe'U)]  dT 

=     f"  A(t,   t)cov[j(t),    y_(tfJdT 

/tn 


Therefore 

K(t)R(t)    =  A(t,    t)R(t)    =   covfot   |  t),    y_(t) 
since 

covPx(t   |  t),    y_(t)] 

=   cov[x(t   |  t),    H(t)(x(t   |  t)    +  x(t  |  t))| 

=   cov[x(t   |  t)  ,    x(t  |  t)]H'  (t)    +   cov|"x(t   |  t)  ,    x(t  |  t) 

=  P(t)H'(t)    . 


(61) 


H'(t) 


-1, 


And  R(t)    is   assumed   positive    definite,    so    that  R      (t) 
exists.      Multiplying  both   sides   of   equation    (6l)    by  R"    (t),    yields 


K(t)    =  P(t)H' (t)R-1(t) 


(Hid) 


The  general  solution  of  equation  (He)  is 


3S 


tit  !  t)  =tc(t,  t0)g(jb1|t0) 


+  f      1    (T,  t)  [g(t)u(t)  -  K(T)v(T)ldT  (62) 

yt0   c         L  ~    J 

where  t  (t,  i)    is  the  common  transition  matrix  of  (Ic)  and 
(lie).   Then  one  can  derive  the  variance  equation 

P(t)  =  cov[x(t  |  t),  x(t  |  t)J 

Since  x( tQ  |  tQ)    is    independent   of  u(t)    and  v(t)    as    tQ  ^  t   ^  t, 

this   becomes 

P(t)    =rc(t,    t0)cov[x(t0|t0),    x(t0|t0)]^c'(t,    t0) 


+ 


E{/     ¥0(*,   t)[o(«c)u(t)    -  K(-r)v(T)]dT 


J0 
X  - 


•      f     ru'(T:')G'(T;»)    -   v»(T')K'(T')]y    »(t,    T  ■ )  dT  '} 
%  L  J    °  ) 


Hence 

p(t)   -  V  (t,  t0)p(t0)V  <(t,   t0) 


=     f     L(t,   t)["  f'  G(T)Efu(T)u'(T')}G'(n:')cf/  '(t,    t  < )  dx 
/t0      C  L  7t     ~  I"       -  J  -  -c 


+      f     K(t)e(v(t)v'(t')]k'('u)^    '(t,    T')dT'ldT 
/t0  <•  j  C  J 

=     f     l!l0(t,   t)|    f     G(t)Q(t)6(t   -  t«)G»(t')Y   <(t,    T')dT' 

;to  L/t0 

/      K(t)R(t)6(t    -  t')K'(t')    •    V    r(t,    T')dT»ldT 
/t0  c  J 


+ 


36 


=  /   !L(t'  T)  k(^)Q(^)G'(^)  +  K(t)R(t)K'(t)  V"(t,  T)dT 
/t()   C        L  -    J-C 

(63) 
Differentiating  with  respect  to  t,  the  left  side  of  equation 
(63)  is 

d 

dt 


5-[z(t)  -rcu,  t0)p(t0)V0.(t,  to)] 


dp   d  r 


dt    dt  I- 
This  equation  becomes 
d 
dt 

dp 

dt 


lc(t,  t0)p(t0)rc«(t,  t0)] 


5-[z(t)  -rc(t,  to)P(to)rc.(t,  to)] 


-^-[p(t)  - K(t)H(t)]yc(t,  t0)p(t0)y.(t,  t0) 

*,(t,  t0)P(t0)^'(t,  t0)[p-(t)  -H'(t)K'(t)]     (61;) 


since 
d 

dt 
d 

d- 

The  derivative  of  the  right  side  of  equation  (63)  is 
d    A, 

% 


£c(t'    tQ)    =   [p(t)    -  K(t)H(t)]^c(t,    tQ) 


~-c'(t'    t0)    s!4f(t*    tyfe'dO    -H'U)K'(t)] 


=  —     f       L(t,    t)[g(t;)£(t)G'(t) 

dt     't„  L-  ~    _ 


'0 


-t    <? 


+    K(T)R(T)K«(T)     V^    <(t,     T)dT 


=     f     —    tj*>    ^)Tg(t:)S(^)G'(^) 

/t0  <?t      c  L~ 

+    K(T)R(T)K'(-0]  V/U,     T)dT 
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+  ^c(t,  t)[G(t)Q(t)G'(t)  +  K(t)R(t)K»(t)]g  «(t,  t) 

=  [p(t)  -  K(t)H(t)]  f       JJc(t,  t)[g(t)Q(t)G'(t) 

+K(t)H(t)K'(T)lf  i(t,  T)dT 

.t 
+  /   £c(t,  ^)[g(t)£(t)G'(t)  +  K(t)R(t)K'(t)] 

V0'(t,  T)d-r[p'(t)  -  H'(t)K'U)] 
+  [G(t)£(t)G'(t)  +  K(t)R(t)K'(t)] 
=  [p(t)  -  K(t)H(t)][p(t)  -  ^(t,  t0)P(t0)^c'(t,  tQ)] 

+  [p(t)  -  Vc(t,  tQ)P(t0)^c'(t,  t0)j-  [P'(t)  -  H'(t)K'(t)] 


1 


+  [G(t)Q(t)G'(t)  +  K(t)R(t)K'(t)j 


(65) 


Combining  equations  (61;)  and  (65)  and  solving  the  result  for 
dp/dt  yields 
■  d 
dt 


P(t)  =  [p(t)  -  K(t)H(t)]p(t)  +  [p(t)]-[p»(t)  -  H'(t)K*(t)l 
+  [G(t)Q(t)G'(t)  +  K(t)R(t)K'(t)j 


Substituting  K(t)  =  P(t)H'(t)R  X(t)   into  the  above  equation 
yields 
dP(t) 


dt 


=  F(t)P(t)  +P(t)F'(t)  -  P(t)H'(t)R_1(t)H(t)P(t) 
+  G(t)g(t)G'(t) 


(IVc) 


Now  one   can   evaluate    the   following  covariance  matrix. 


covjjx(t   ft),    u(t)J   =  covf^c(t,    t0)xU0|t0) 
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/   ^  (t,  T)[g(t)u(t)  -  K(T)v(T)]dT,  U(T)1 

h0    c     L  J 


^/ 


However,  since  x(t0|t0)  is  independent  of  u(t)  and  v(t)  is  also 
independent  of  u(t),  hence  this  equation  can  be  reduced. 


[$c(t  |  t),  u(t)] 

=  cov   f     VQ(t,  T)G(T)u(T:)dT,  u(t)  1 
L  tQ 

=  f     Sc(t,  T)G(T)E[u(T:)u,(t)jdT 


f    Y   (t,  t)G(t)£(t)6(t  -  t)dT 
>tn    C 


=  -   G(t)§(t) 
2 


cov 


5c(t  I  t),  z(t)l 
=  cov  I  x(  t  I  t) ,  H(  t)  f£(  t  |  t)  +  x(  t  |  t))  +  v(  t) 


The  above  equation  can  be  reduced 

covtx( t  [  t) ,  z( t) J 

=  covj~x(t  I  t)  ,  x(t  I  t)  H'  (t)  +  cov  x(t  I  t)  ,  v(t)] 

=  P(t)H'(t)  -  ( '  ^  (t,  T)K(T)E/v('r)v'(t))d'i: 


-t 

'to 

ft 


P(t)H'(t)  -  (     ^  (t,  t)K(t)R(t:)6(t  -  t)dt 


=  P(t)H'(t)  -  1/2  K(t)R(t) 


since 


cov[x(t  I  t) ,  x(t  I  t) J  =  0 
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Substituting  P(t)H'(t)  =K(t)R(t)   into  the  above  equation 
yields : 


^  -,    1 

cov[x(t  ft),  z(t)J  =  -  K(t)R(t) 


J    2 


To  derive  the  formula  for  prediction,  it  is  noted  that  if  t,  ~>   t, 

then 


xH-jJ  -I(tlf  t)x(t)  +   f  I(tx,  T)G(x)u(t)dT 


,t 
/tQ 

Since  u(t)  for  t  <  t  £  t]_  is  independent  of  x(t)  in  the  interval 

,t 


t0  «£  t  ^  t,  the  term  f  i(t-,,  t)  G(t)u(t)  dT  vanishes  in 

/t0 

Efx(tx      *)}     »    therefore 

x(tx   [  t)    =  |(tx,    t)x(t   |  t)  for   tx  5-  t0  (Vc) 

If  t-j_  <  t,  one  does  not  know  whether  xK)  and  u(t)  would  be 
independent  in  the  required  interval.   Hence  the  same  conclusion 
cannot  be  drawn  for  the  interpolation  case. 

These  five  equations  are  discussed  in  the  following 
paragraphs. 

The  differential  (difference)  equation  for  optimal  filter 
is 

x(t  +  ljt)  =  |(t  +  1,  t)x(t|t  -  1)  +  K(t) 

•[z(t)  -  H(t)x(t|t  -  1)]  (Id) 

dx(tjt) 

=  F(t)x(t  |  t)  +  K(t)fz(t)  -  H(t)x(t  [  t)  !       (Ic) 

dt  -    L-      -   -    '   J 

It  governs  the  optimal  filter,  which  is  excited  by  the  observed 
signals  and  generates  the  best  linear  estimate  of  the  message. 


ko 


The  initial  state  x(t0|  tQ)  =  0,  since  initially  there  are  no 
observations  and  the  mean  of  x(tQ)  is  zero.   It  is  noted  here 
that  the  first  term  of  the  equation  is  the  estimate  of  x  based 
on  the  observation  before  t.   The  quantity  in  the  brackets  is 
then  the  difference  between  the  observation  and  the  estimated 
value  of  the  x(t),   The  weighting  matrix  K(t)  weights  the  error 
signal  to  produce  an  increment  to  be  added  to  the  estimate;  the 
value  of  x(t  +  l|t)  is  known  immediately  after  time  t,  but  it 
is  not  needed  to  compute  the  next  estimate  until  time  t  +  1; 
this  delay  makes  it  possible  to  compute  K(t)  by  digital  com- 
puter.  The  general  block  diagram  of  the  optimal  filter  is 
shown  in  Figs.  3  and  1±.      It  is  a  feedback  system.   The  input  is 
the  actual  observation  from  which  the  latest  estimate  of  the 
observable  vector  is  subtracted.   The  difference  is  then  multi- 
plied by  the  weighting  matrix  K(t)  to  decrease  the  error  of  the 
new  estimate  x. 

The  prediction  formula  for  Gauss-Markov  sequence  or  pro- 
cess is: 

x(tx|t)  =  |(tx,  t)x(tlt  -  1)   for  tx>  t  +  1      (Vd) 
x(t.jjt)  =|(t1,  t)x(t|t)       for  tx  £  t  (Vc) 


These  are  the  formulas  for  prediction  only  and  are  also  shown 
in  Pigs.  3  and  Ij.. 

The  differential  (difference)  equation  for  optimal  esti- 
mation error  is 
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X 


_(t  +  l|t)  =  |(t  +  1,  t)x(t  +  l|t)  +  4(t  +  1,  t)u(t) 

rH(t)x(t|t  -  1)  +  v(t)j 
dx(t|  t) 


dt 


:  F(t)x(t|t)  +  G(t)u(t)  -  K(t) 
H(t)x(t|t)  +  v(t)] 


-  K(t) 
(lid) 

(He) 


It  governs  the  optimal  estimation,  error.   General  block  diagram 
of  the  optimal  estimation  error  is  shown  in  Figs.  5  and  6. 
Optimal  gain  formulas  are: 


K(t)  =  [|(t  +  1,  t)P(t]t  -  l)H'(t)J 


[H(t)P(t jt  -  l)H'(t)  +  R(t) 


-1 


-1, 


K(t)  =  P(t)H'(t)R  ±(t) 


(Hid) 
(IIIc) 


They  are  gains  of  the  optimal  filter  expressed  in  terms  of  the 
error  covariance  matrix  P(t).   The  magnitude  of  the  elements  of 
K(t)  are  indicative  of  the  amount  of  information  contained  in 
the  signal  z(t)  at  time  t.   In  Figs.'  3  and  k,    it  is  shown  that 
the  optimal  properties  of  the  filter  depend  upon  the  proper 
selection  of  the  weighting  matrix  K(t). 
Variance  equations  are: 

P(t  +  l|t)  =  |(t  +  1,  t)jP(t|t  -  1)  -  [P(t|t  -  l)H'(t)J 
H(t)P(t|t  -  l)H'(t)  +  R(t)]"1 
H(t)P(t|t  -  1)]  |'(t  +  1,  t) 
+  £(t  +  1,  t)Q(t)A'(t  +  1,  t)  (IVd) 

dP(t)  n 

— =  F(t)P(t)  +P(t)F'(t)  -  P(t)H'(t)R"1(t)H(t)P(t) 

dt  ,„.  > 

+  G(t)Q(t)G'('t)  (IVc) 
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They  are  nonlinear  differential  or  difference  equations  which 
govern  the  variance  matrix  of  the  errors  of  the  optimal  linear 
estimate.   No  _z(t)  terms  are  involved  in  the  variance  equations. 
This  means  the  conditional  covariance  matrix  does  not  depend  on 
the  values  of  conditional  variables.   Hence  the  variance  equa- 
tion is  independent  of  the  observations  z(t).  This  is  a  special 
property  of  the  Kalman  filter.   Since  the  gains  of  the  optimal 
filter  are  governed  by  the  variance  equation,  this  means  the 
structure  of  the  optimal  filter  can  be  determined  independently 
of  the  random  data  z(t).   Given  z(t),  tQ  <  t  ^  t,  one  can  com- 
pletely determine  the  conditional  distribution  of  the  random 
variable  x(t)  for  all  t]_  $■  t  from  equations  (I),  (III),  (IV), 
and  (V) .   This  is  due  to  the  gaussian  and  markovian  assumptions. 
The  variance  equation  is  just  another  form  of  the  Wiener-Hopf 
equation.   Its  solution  yields  the  covariance  matrix  of  the  min- 
imum filtering  error  which  contains  all  the  necessary  informa- 
tion for  the  design  of  the  optimal  filter.   The  variance  equa- 
tion is  also  closely  related  to  the  calculus  of  variations  as 
will  be  discussed  later.   The  initial  state  of  P  is  given  as 
part  of  the  problem  statement.   Then  the  solution  of  the  vari- 
ance equation  is  determined.   If  P_o(t)  is  nonnegative  definite, 
then  P(t)  is  also  nonnegative  definite  for  all  t  <?  tQ.   In  the 
stationary  case,  |(t  +1,  t) ,  A(t   +  1,  t),  H(t),  R(t),  Q(t)  are 
constants.   Hence  the  covariance  matrix  is  independent  of  time 
and  can  be  calculated  readily.   The  chief  task  in  filtering 
theory  is  the  study  of  the  variance  equation.   This  is  difficult 
because  the  variance  equation  is  nonlinear.   But  it  is  a  very 
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special  one  in  that  it  is  a  matrix  Riccati  equation.   A  gener- 
alized matrix  Riccati  equation  has  the  form 

—  +  -   -3-  +  -  $k  ~   -1-  "  -2  =  -  * 

where  G^,  G2,  G^,  and  Gr  are  n-j_  x  n-^,    u^^   x  n2,  n2  x  n-j_,  and 
n2  x  n2  matrices,  respectively.   It  is  a  classical  (or  scalar) 
Riccati  equation  when  n-j_  =  n2  =  1.   It  has  the  same  form  as  the 
variance  equation,  with  n^  =  n2  =  n;  (Ref.  19),  which  is  well 
known  from  the  calculus  of  variations.   An  exact  formula  for 
the  solution  of  the  variance  equation  will  be  derived  from  the 
quadratic  Hamiltonian  function.   Consider  the  Hamiltonian  func- 
tion ft     as  defined  by  (Ref.  2). 


ft  =   -  -  ||G'(t)x||2Q(t)    -    s   F<  (t)x  +  -  ||H(t)s||2 


2  2 

The  canonical  differential  equations  of  Hamilton  are: 

dx  2]{  , 

—  =  grads/L  =  -£-  =  -  F'(t)x  +  H'(t)R_1(t)H(t)s 
dt  2s  ~ 

ds         ,,  -2ji 

—  =  grad  ft  =   =  G(t)Q(t)G'(t)x  +  P(t)s 

dt       x     $x  ~       ~       ~  ~       ~       ~ 


(66) 


(67) 


Let  S(t),  X(t)  be  the  matrix  solutions  of  the  system  (67)  which 
satisfy  the  initial  conditions. 

X(0)  =  I 

(68) 

s(t0)  =  p(t0)  =  p0 

Assume 

S(t)  =  P(t)X(t)  (69) 

Substituting  equation  (69)  into  equation  (67),  one  obtains 


JU-8 


d-U)  =  [_  pi(t)  +  H'(t)R-1(t)H(t)P(t)]x(t) 


(70) 


dt 
ds(t)    dP(t) 


x(t)  +  P(t) 


dt 


dt 


dx(t) 


dt 


=  [G(t)Q(t)G'(t)  +  F(t)P(t)]x(t) 
Combining  equations  (70)  and  (71),  one  gets 


(7D 


dP(t) 


dt 


=P(t)F'(t)  +  F(t)P(t)  -  P(t)H'(t)R"1(t)H(t)P(t) 


+  G(t)Q(t)G'(t) 


(72) 


It  is  identical  with  equation  (IVc),  hence  the  identity  S(t)  - 
P(t)X(t)  is  valid.   Let  T  be  the  matrix  of  coefficients  of  the 

system  (67) . 

\-   F'(t)    C(t) 


T  = 


D(t)     P(t) 

L   —  —      *J 


where 


-1, 


C(t)  =  H'(t)R"x(t)H(t) 
D(t)  =  G(t)Q(t)G'(t) 


Hence  the  system  becomes 


1 


dx 
dt 
d_s 
dt 


r-F'(t)    C(t) 
D(t)     F(t) 


x(t) 


(73) 


s(t) 


->    u 


J 


Let  e(t,  t0)  be  2n  x  2n  transition  matrix  of  the  system  (73). 
e(t,  t0)  be  partitioned  into  four  n  x  n  submatrices  as  follows 


e(t,  t0)  = 


©]_-[_(  t,  tQ)    ©12^'  *"()) 


©91(t,    t(\)  6rjo(t,    tn) 


-21 


-22 


J0' 


where 


d9 

—  =  T  0  . 

dt 


Thus 


x(t) 


H(t) 


ill^*  *o)    ©12(t>  to) 
€>2]_(t,  tQ)    ©22  ^t,  tQ) 


1 


x(t0) 


s(t0) 


k9 


(7k) 


Substituting  the  initial  condition  (68)  into  the  above  equa- 
tion, one  has 


x(t)  =  exl(t,  t0)  +  e12(t,  t0)P0 
s(t)  =  e21(t,  t0)  +  e22(t,  t0)P0 


(75) 


Combining  equations  (75)  and  (69),  one  gets  an  exact  solution 
of  variance  equation  (IVc). 


p(t)  =  [e21(t,  t0)  +  e22(t,  t0)P0J 
[£1]L(t,  t0)  +  e12(t,  t0)PQ 


(76) 


EXAMPLE  OF  KALMAN  FILTERING 


Suppose  a  particle  leaves  the  origin  at  time  t  =  0  and 
moves  thereafter  with  a  constant  but  unknown  velocity  of  zero 
mean  and  known  variance.   The  position  of  the  particle  is 
measured,  the  data  being  contaminated  by  the  addition  of  white 
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noise.   What  is  the  optimal  estimate  of  the  position  and 
velocity  of  the  particle  at  the  time  of  the  last  measurement? 

Let  x-j_(t)  be  the  position  and  Xp(t)  the  velocity  of  the 
particle.   Then  the  problem  is  represented  by  the  model 

X]_(t  +  1)  =  x-^t)  +  x2(t) 
x2(t  +  1)  =  x2(t) 
Z-jjt)  =  Xl(t)  +  v(t)  . 

That  is, 

x(t  +  1)  =  |(t  +  1,  t)x(t)  +  A(t  +  1,  t)u(t) 
and 

z(t)  =  H(t)x(t)  +  v(t) 


where 
|(t 

The  additional  conditions  are 
*1  =  t,    tQ  =  0 


fl   ll  r    1 

+  i,   t)  =       ,  A(t)   =  o  ,   H(t)  =  |i,  ol 
lo-  lj  J 


E^Xl(0)j  =  EJx12(0)|  =  e/x2(0)]  =  0 
E^x22(0)j  =  a2 


and 


E/v2(t)|  =  R(t)  =  b2,         for  all  t. 

First,  one  wants  to  predict  the  position  and  velocity  of  the 
particle  one  step  ahead. 
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P(0)  = 


=  E 


~xn2(0) 


P(l)  = 


cov|x(0),  x(l)J 

x]_(0)x2(0) 
|x2(0)x1(0)       2S22(0). 

(j)jP(O)  -  P(0)H'  [hp(o)h'  +  r"]"1  hp(o)]  0 


0    0 
0   a2 


J 


2  2 

a*1  a 

2  2 

a^  a^ 


Then  from  equation  (iVd),  the  covariance  matrix  is 
P(t  +  1)  =  |jP(t)  -  P(t)H'(t)  [H(t)P(t)H'(t)  +  R(t)]_1H(t)P(t)]5 
b2Sp11(t)+2p12(t)+p22(t)  +  P1i(t)p22(t)-p12(t)"|  , 

b2rp12(t)+p22(t)]  +  rpi:L(t)p22(t)-p22  (t)J 


Pi:L(t)+v 


b2  rP22(t)+Pi2(tll  +  Lpll(t)p22(t)"p12(t)1     ' 
b2p22(t)    +    rp11(t)p22(t)-p12(t)"| 


(77)J 


By  simple  substitutions  of  t  =  0,  t  =  1,  t  =  2,  .  .  .,  into 
equation  (77),  it  can  be  shown  that 

2 

p11(t)p22(t)  =  p12(t) 


Then 


pxi(t)  =  t  p22(t)  . 


P12(t)  =  P21(t)  =  tp22(t) 


Substituting  these  equations  into  equation  (77),  the  covariance 
matrix  is: 


I 
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P(t+1)  = 


.2/4.2 


t2p22(t)+b2 


b^p22(t) 
t2p22(t)    +  b< 


b*(t*  +   2t   +   l)p22(t) 


b2(t  +   l)p22(t) 


(t   +   ir        (t  +   1) 
(t   +   1)  1 


b2(t+Dp22(t) | 


b^P22(t) 


for   t  ^  0 


where 


p22(t   +  1)    = 


b2p22(t) 


t2p22(t)    +  b2 


for   t  5?  0    . 


A s sume 


c(t)    =   t2  + 


p22(t) 


and 


c(t+l)    =    (t+1)2   + 


=    (t+ir   + 


P22(t+1) 


t2   + 


=    (t+lp   + 


p22(t) 


b2P22(t) 
t2P22(t)+b2 


=    (t+lp  +   c(t) 


where 


b2 
c(0)    =  —    . 


Hence 


P(t+1) 


>2  |(t  +  D2    (t  +  i)| 


c(t)    I  (t   +    1) 


for  t  ^  0. 


Hence 


- 


S3 


P(t)  = 


c(t  -  1; 


t 


t 
i 


for  t  >   1 


where 


c(0) 
c(t) 


=  c(t  -  1)  +  t' 


Then  K(t)  is  found  from  equation  (Hid). 

|HP(t)Hr  +  R(t)| 


K(t)  = 


|.  P(t)H' 
1 


tl   +  c(t  -  1) 


t(t  +  1) 

t 


x(t  |  t)  =  \   x(t|t  -  1)  +  KdOpzU)  -  H(t)x(t|t  -  1)J 


where 


$(t0) 


x(0) 


=  0  . 


If  t»l,  then 


c(t)  =  c(t  -  1)  +  td 

=   c(t  -  2)  +  t2  +  (t  -  l)2 
=  c(0)  +  t2  +  (t  -  l)2  +  .  . 


.  +  1 


=  c(0)  + 


t(t  +  1) (2t  +  1) 


=  c(0)  + 


t3 

3 


b2   t3 
I2*" 


. 


5k 


Hence 


P(t)  = 


b2 

t2 
t 

t 

1 

b2         t? 

~2+7 

and 


K(t)  £ 


t2  + 


1 


a  b 


b2  + 


,2t3 


b2   t3 


t(t  +  i)l 
t 


t2      t 


t       i 


b2  + 


*2t3 
3 


for  t»l  . 


A 


It  is  obvious  that  k-^ — }0  and  k2  — ^  0  for  t»l.   This  means 
that  as  the  number  of  observations  becomes  large,  the  estimates 
^(t  +  1  |  t  +  1)  and  x2(t  +  1  |  t  +  1)  will  depend  on  the  pre- 
vious estimate.   It  will  be  shown  later  that  for  t>>l,  the 
discrete  observation  case  is  essentially  the  same  as  prediction 
based  on  continuous  observations.   Shinbrot  (11),  who  treated 
this  problem  on  the  continuous  d'ata  case  using  a  completely  dif- 
ferent method,  obtained  the  same  results  as  here.   (See 
Appendix  A . ) 

If  the  position  of  the  particle  is  continually  observed  in 
the  presence  of  additive  white  noise,  the  problem  is  repre- 
sented by  the  model, 


d*l 

dt 


dx2 

dt 


*2 


=  0 


dx 

—  =  F(t)x(t)  +  G(t)u(t) 

dt  ~ 


Z_    =    X-i  +  V 
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where 
F(t)  = 


ro   1] 


,   G(t)  =  £,   and   H  =  [l,  o]  . 


_0    0_. 
The  initial  state  of  covariance  matrix  is 


P(0)    =   covT^Uq),    Xq^oL 
dP(t) 


0        0 
0      a2 


-1, 


dt 


=  PP   +   PF'    -    PH'IT^HP   +   GQG' 


0      1  !    j~pll      P12 


_o    oj  Lp21    p22^ 

_p21   p22|  [oj  [yj 


Pn   P12 
P21   P22. 

1 


2P12      P22 


22 


dpi;L        dp12 


1 
b2 


fl, 


'11 


0 


fo     0 

U  oj 

Pll    P12 
P21    P22 

p11p12 


J 


dt 


dt 


dp2-,         dp22 


dt 


Hence 


dt 


P11P21   P12p21 

t 

1 


2p12  "  ~2   PH     p22  "  "2  pllp12 


P2i  "  ~2  P11P21 
b 


T?  P12p21 
b*1 
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dp 


11 


dt 


=  2p 


12 


b2 


dp 


12 


dt 


dp21         PllP21 

=  P22 0 — 

dt  b2 
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dp 


22 


dt 


Pl2' 
1,2 


Using  Adam's  method  (5),  one  can  solve  these  nonlinear  differ- 
ential equations  by  digital  computer.   (See  Appendix  B.) 

One  can  get  an  exact  solution  of  the  variance  equation  from 
the  canonical  differential  equations  of  Hamilton. 


dx 


dt         2s 

ds      -dH 

—  =  =  G(t)Qjt)G'(t)x  +   P(t)s 

dt         2  2S 

In   this    case 

dx            f*0      0"1 

*1 

+ 

^ 

i-      J 

-     ~i 

«  =  -[i    OJ 

22 

0 

| 

s2 

ds        -B'H          To      1 

1 

! 

£1" 

* 

dt          dx            [_ 

D      0 

L 

r2. 

Then  the  matrix  of  coefficients  of  the  Hamiltonian  equa- 
tions is: 


0   0  —  0 
D2 
-10    0   0 


I  = 


0   0    0   1 
0   0    0   0 


and  the  corresponding  transition  matrix  is 


$Q 


6(t,    t0)    =   exp  T(t   -    t0)    = 


L- 

i=0 


[T(t    -    to)]1 


=  T°   +  Tt  +.  T2t2/2   +  T3t3/6  +    .    . 


as    tg   =  0 


where 


,0   _ 


I   = 


10  0  0 
0  10  0 
0  0  10 
0      0      0      1 


0      0      —      0 
b2 


T      = 


T^    = 


■1  0 

0  0 

.  o  0 

0  0 

0  0 

0  0 

0  0 


0      0 
0    •  1 

o    o_ 
1  • 

0      — 
b2 

-1 

—       0 

b2 


0 
0 


0 
0 


T3  = 


0      0      0         0 

-1 

0      0      0      — 

b2 

0      0      0        0 

0      0      0         0 


T^4-  =   t^   =   T      = 


.    .    =   0 


Hence 
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1   0 


t 

t2 

b2 

2b2 

-t2 

-t3 

9(t)  =   "      2b2   6b2 
0   0    1     1 
0   0    0     1 


The  four  submatrices  of  6  are  as  follows: 


eu(t)  - 

921(t)    = 

0      0 

[o     0 

e12(t)  = 


e22(t)  = 


;/b2 
-t2/2b2 


1 


t/2b2  t 

-t3/6b2J 


1  1 
0   1 


Prom  equation  (76),  one  gets 


P(t)  = 


re21(t)  +  e22(t)p0]  re11(t)  +  e12(t)P0l 


-l 


=  a2 


0    t 
0    1 


a2t2 


2b2 
6b2  -  a2t3 


-t 


6b2 


-1 


a2b2 


2*3 


b2 


+ 


s^t 


3 


i  t    t 


t    1 


K(t)  =  P(t)H'(t)R-1(t) 


-1/ 
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a2b2 


b^  + 


»2t3 

3 


b2  + 


32t3 
3 


t2  tl 


t   l 


1  i  "I 

!  b2 


JL°J 


This  is  the  same  result  as  before. 


CONCLUSION 


A   Kalman  filter  is  constructed  using  matrix  theory  and  the 
state  and  state  transition  approach  to  linear  system  theory  to 
solve  the  Wiener  problem.   Kalman' s  derivations  are  not  affected 
by  the  nonstationarity  of  the  process  or  the  finiteness  of  avail- 
able data.   Hence  Kalman  filtering  can  be  applied  to  both  the 
stationary  and  the  nonstationary  cases.   An  important  feature 
of  the  Kalman  filter  is  that  the  structure  of  the  filter  can  be 
determined  independent  of  the  random  data  inputs.   It  provides 
the  error  analysis  independent  of  any  data  inputs.   Further,  it 
performs  this  error  analysis  in  a  very  efficient  way.   The  high- 
speed digital  computer  plays  an  important  role  in  this  error 
analysis.   Theoretically  speaking,  Kalman  filtering  enlarges 
the  realm  of  Wiener  filtering  and  clarifies  the  fundamental 
assumptions  and  their  consequences.   Practically  speaking, 
Kalman  filtering  is  well  adapted  to  digital  computer  usage, 
while  Wiener  filtering  is  not. 


6l 


A  Kalman  filter  can  be  applied  to  estimate  the  position, 
velocity,  and  altitude  of  a  terrestrial  or  space  navigator. 
For  example  (Ref .  13) ,  Kalman  filter  estimates  of  the  position 
and  velocity  of  a  space  vehicle  can  be  used  for  the  purpose  of 
midcourse  guidance.   The  source  of  information  is  the  sequence 
of  measurements  of  three  space  angles.   The  measurements  cannot 
be  determined  exactly  because  of  the  presence  of  additive  white 
noise.   The  equation  of  motion  is  nonlinear,  but  it  can  be  lin- 
earized by  Taylor's  expansion.   The  Kalman  filter  solves  the 
estimation  problem  using  a  computation  scheme  which  weights  the 
incoming  measurements  in  an  optimal  sense  to  produce  an  up-to- 
date  optimal  estimate  of  position  and  velocity.   This  computa- 
tion scheme  is  readily  implemented  by  a  digital  computer.   The 
restrictions  on  the  speed  of  the  digital  computer  are  that  the 
time  required  to  complete  the  computation  cycle  must  be  less 
than  the  time  interval  between  successive,  observations.   Hence 
it  is  clear  that  a  Kalman  filter  is  heavily  dependent  upon 
digital  computer  technology. 
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APPENDIX  A  ' 

Sinbrot  solved  example  1  in  his  paper  "Optimization  of 
Time-varying  Linear  Systems  with  Nonstationary  Inputs"  with 
correlation  functions  as  follows. 

The  input  to  the  system  is 

f±(t,  p,  OJ  =  fm(t,  p)  +  -fn(t,  QJ 
where  f  and  f   are  message  and  noise,  respectively. 
fm(t,  p)  =  Pt 

P  is  the  unknown  velocity  of  the  particle.   The  mean  square  value 

p 

of  P  is  a  .   The  hoise  is  white  and  is  independent  of  the  mes- 
sage.  Hence 

fe^  T)  =  ^W^  T)  =  ^di^  T>  =  t^»  T>  =  s2tT 

^nn(t,  T)  =  b26(t  -  T) 

tf±1(t,    t)  =  ^mm(t,  t)  +  #nn(t,  t)  =  a2tT  +  b26(t  -  T) 
The  cross  correlation  of  input  and  desired  output  is 
#di(t,T)  =  ^dm(t,T)  =  |  h(t,(r)^rnrn(T,cr)d<r+  b2h(t,t) 

for  0  ■<  T  <  t  . 
Hence 

'0 

and  multiplying  by  T  on  both  sides  and  integrating  with  respect 
to  i   from  zero  to  t  yields 


a2tT  =  a2 


t  /  cr  h(t,  cr  )d<r+  b2h(t,  t) 
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/   a2tt2dT  =  f     a2i2     f    <T h(  t ,<r)  &<T    dT  +  /  b2h(t,T)dt 
'0  'o       Ly0  J       'o 


Let 


j(t)  =   o-h(t,<r)ao- . 


Hence 


t3    .  t3 
3 


a2t  —  =  a2  —  j(t)  +  b2j(t) 


2t^ 


j(t)  = 


a£t 


a2t3  +  3b2 


Shinbrot  pointed  out  that  the  optimal  system  response  is 


h(t,T)  =  a^x 


ft  -  j(t) 


3a2tT 


i2t3  +  3b2 


t  £  T 


and  the  mean  souare  error  is 


€2  =  b2h(t,  t) 


3  a2b2t2 
i2t3  +  3b2 


a2b2t2 


a2t3 
3 


+  b< 
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Kalman  approached  the  Wiener  problem  from  the  "state"  point 
of  view  and  thought  of  linear  filtering  as  orthogonal  projection 
in  Hilbert  space.   The  physical  relationship  among  the  state 
x(t),  the  white  noises  u,  v,  and  the  observation  _z(t)  is  de- 
scribed by  a  linear  system  which  is  specified  by  a  system  of 
first-order  linear  differential  (difference)  equations.   The 
statistical  description  of  the  white  noises  u,  v  is  given  as 
part  of  the  problem  statement.   The  problem  is  that  one  observes 
_z(t)  over  some  interval  of  time,  tQ  ^  t  ^  t,  and  one  wants  to 
find  the  optimal  estimate  x(t-i)  °^  2S^l)  given  these  observa- 
tions.  This  optimal  estimate  x(t^)  of  x/,t]_)  will  minimize  the 
expected  loss 

E  {L(x(t]_)  -  xU-jj)}   =  e[£   [^(t-L)  -  x^t^]2}    . 

In  the  case  where  t-,  -^  t,  this  is  called  the  data-smoothing 
(interpolation)  problem:  where  t-,  =  t,  this  is  called  the  fil- 
tering problem;  where  t-j_  >  t,    this  is  called  the  prediction 
(extrapolation)  problem.   Using  the  conditional  distributions 
and  expectations,  Kalman  transformed  the  Wiener-Hopf  integral 
equation  into  a  nonlinear  differential  equation  of  the  Riccati 
type  which  is  closely  related  to  the  Hamiltonian  differential 
equations  of  the  calculus  of  variations.   The  solution  of  this 
nonlinear  differential  equation  yields  the  covariance  matrix  of 
the  minimum  filtering  error  which  will  determine  the  structure 
of  the  optimal  filter  independent  of  the  input  data.   Kalman '  s 
approach  is  not  affected  by  nonstationarity  of  the  process  x(t) 
to  be  estimated,  or  the  finiteness  of  available  data.   Hence  a 


. 


Kalman  filter  can  be  applied  to  both  cases,  stationary  and  non- 
stationary.   Practically  the  main  contribution  of  the  Kalman 
filter  is  that  it  provides  numerical  procedures  well  adapted 
to  digital  computer  usage. 


